Abstract

This report presents Analysis of the effect of Depth on Price of diamonds. It is generated automatically from 'ggplot2::diamonds' dataset.
It is available in html, Word and PDF formats, all compiles from the same R Markdown script shown below

Methodology

Method 1: Putting code inside R Markdown

Code chunk below is generate to write an .csv file that contains the subset of data (defined by 'SAMPLE_SIZE', 'RANDOM_SEED' parameters)` and produce the graph that visuzalizes this subset.

NB: This code is written in such way that it can be re-used inside a 'for' loop or within an interactive application, where 'SAMPLE_SIZE', 'RANDOM_SEED' parameters can be changed automatically (in loop) or by user (in app).

Note the following:

  • naming of constants (ALL_CAPITALS) and variables (hungarianNotation)
  • the use of data.table: setDT, use of pipes [][]
  • how the way the code is structured (spacing, use of =, <-, %>%, .[] )
  • all dataset fields are always used in quotes ("") so that they can be references a variable when needed (eg. when selected by user in interactive app)
# Below is the code that is pasted from `auto_report_code1.R`

dt <- ggplot2::diamonds %>% setDT %>% .[order(get("price"))] %>% setcolorder("price"); dt
##        price carat       cut color clarity depth table    x    y    z
##     1:   326  0.23     Ideal     E     SI2  61.5    55 3.95 3.98 2.43
##     2:   326  0.21   Premium     E     SI1  59.8    61 3.89 3.84 2.31
##     3:   327  0.23      Good     E     VS1  56.9    65 4.05 4.07 2.31
##     4:   334  0.29   Premium     I     VS2  62.4    58 4.20 4.23 2.63
##     5:   335  0.31      Good     J     SI2  63.3    58 4.34 4.35 2.75
##    ---                                                               
## 53936: 18803  2.00 Very Good     H     SI1  62.8    57 7.95 8.00 5.01
## 53937: 18804  2.07     Ideal     G     SI2  62.5    55 8.20 8.13 5.11
## 53938: 18806  1.51     Ideal     G      IF  61.7    55 7.37 7.41 4.56
## 53939: 18818  2.00 Very Good     G     SI1  63.5    56 7.90 7.97 5.04
## 53940: 18823  2.29   Premium     I     VS2  60.8    60 8.50 8.47 5.16
# Constants - which we want to be able modify (either automatically - in loop, or manualy - using interactive App) ----
SAMPLE_SIZE = 150
RANDOM_SEED = 99; set.seed(RANDOM_SEED)
CLARITY = (dt$ clarity %>% unique %>% sort)[1] # "I1"

# Subset data ----

# dt1 <- dt[clarity==CLARITY] [sample(.N, SAMPLE_SIZE)] [order(price)];  
# If data field (eg 'clarity") can be modified by user, it should be codes as shown below
dt1 <- dt[get("clarity")==CLARITY] [sample(.N, SAMPLE_SIZE)] [order(get("price"))]; 
dt1
##      price carat       cut color clarity depth table     x     y    z
##   1:   452  0.43   Premium     H      I1  62.0  59.0  4.78  4.83 2.98
##   2:   468  0.32      Good     D      I1  64.0  54.0  4.36  4.33 2.78
##   3:   491  0.40      Good     F      I1  63.3  60.4  4.64  4.68 2.95
##   4:   511  0.39 Very Good     E      I1  62.8  57.0  4.61  4.66 2.91
##   5:   584  0.50      Fair     F      I1  69.8  55.0  4.89  4.80 3.38
##  ---                                                                 
## 146: 11548  3.00      Good     E      I1  64.2  65.0  9.08  8.96 5.79
## 147: 11594  2.72     Ideal     H      I1  59.6  55.0  9.17  9.13 5.45
## 148: 15223  4.01   Premium     J      I1  62.5  62.0 10.02  9.94 6.24
## 149: 15984  4.00 Very Good     I      I1  63.3  58.0 10.01  9.94 6.31
## 150: 18531  4.50      Fair     J      I1  65.8  58.0 10.23 10.16 6.72
strTitle <- sprintf("Effect of Depth on Price for Clarity '%s' (size=%02g, seed=%02g).csv", CLARITY, SAMPLE_SIZE, RANDOM_SEED)

# Change this to 'T' to start writinh on your disk  
if(F) {
 fwrite(dt1, strTitle)
}

# Two ways of plotting variables ----

g <- ggplot(dt1) + theme_bw() +
  geom_point(aes_string(x="depth", y="price",col="color", size="carat", shape="cut")) +
  labs(title = strTitle)
g
## Warning: Using shapes for an ordinal variable is not advised

# another way to call variables inside ggplot functions - using get() function!
g1 <- ggplot(dt1) + theme_bw() +
  geom_line(aes(x=get("depth"), y=get("price"),col=get("color"))) +
  geom_point(aes(x=get("depth"), y=get("price"),col=get("color"), size=get("carat"))) + 
  facet_grid(get("cut") ~ .) +
  labs(title = strTitle)
g1

Method 2: Putting code in a separate .R file

Instead of putting the R code in R Markdown, it can be put in separate file and called from there using source("auto_report_code1.R")

Converting static tables and graphs into interactive ones!

For html report, the above tables and graphs can be converted into interactive ones - with a single line of code!

Interactive table

Interactive table is the advanced data science tool that allows one to browse and extract data from complex and large datasets efficiently using interactivity. You can sort, filter table by values.

dt1 %>% DT::datatable( 
  rownames=F,  filter="top",
  extensions = 'Buttons',
  options = list(dom = 'Blfrtip', buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

Interactive plot

Interactive plot is the advanced data science tool that allows one to analyze and visualize complex and large datasets very efficiently using interactivity.

  • Hover over a line or a point in plot to see the details about it.
  • Zoom on any section by drugging mouse of the region of interestin the plot. Double-click on the plot restores original size.
  • Click / double-click on factor description at right to exlude or include factor-related data in the plot; Single-click removes/adds selected factor, Double-click removes/adds all but the selected data.
  • Click on addional icons in top right corner for additional actions (eg. save images, help).
plotly::ggplotly(g)
## Warning: Using shapes for an ordinal variable is not advised
plotly::ggplotly(g1)